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Many non-parametric techniques such as Neural Network (NN) are used to 
forecast current reservoir water level (RWLt). However, modelling using 
these techniques can be established without knowledge of the mathematical 
relationship between the inputs and the corresponding outputs. Another 
important issue to be considered which is related to forecasting is the 
preprocessing stage where most non-parametric techniques normalize data 
into discretized data. Data normalization can influence the the results of 
forecasting. This paper presents reservoir water level (RWL) forecasting 
using normalization and multiple regression. In this study, continuous data of 
rainfall (RF) and changes of reservoir water level (WC) are normalized using 
two different normalization methods, Min-Max and Z-Score techniques. Its 
comparative studies and forecasting process are carried out using multiple 
regression. Three input scenarios for multiple regression were designed 
which comprise of temporal patterns of WC and RF, in which the sliding 


window technique has been applied. The experimental results showed that 
the best input scenario for forecasting the RWLt employs both the RF and the 
WC, in which the best predictors are three day’s delay of WC and two days’ 
delay of RF. The findings also suggested that the performance of the RWL 
forecasting model using multiple regression was dependent on the 
normalization methods. 


Copyright © 2019 Institute of Advanced Engineering and Science. 
All rights reserved. 





Corresponding Author: 


Siti Rafidah M-Dawam, 

Faculty of Computer and Mathematical Sciences, 
Universiti Teknologi MARA Kedah, 

P.O. Box 187, 08400 Merbok, Kedah, Malaysia. 
Email: srafidah192 @kedah.uitm.edu.my 








1. INTRODUCTION 

Forecasting RWL is crucial for reservoir’s operator in making decision on the reservoir water 
release (RWR) of a particular reservoir. It is a challenging and complex task, especially during flood and 
drought occurances due to unpredictable inflow such as RF [1]. Thus, a few researches have focused on non- 
structural approaches predicting reservoir inflows [2]. However, during flood or drought, the decision on 
RWR is not only based on the availability of water inflows, but also on the previous release, demands, time, 
etc. Besides daily RF, several researches also considered changes in the RWL (WC) as an input in the 
multipurpose reservoir forecasting model [2]. RF (hydrological data) and reservoir WC are found to be 
correlated in the flood prediction model [3]. 

Many literature conducted on the RWR operation have utilized RF data and RWL as inputs [4], and 
have applied different methods and techniques of Artificial Intelligence and machine learning[5—8]. Only a 
small number of researches conducted on RWR decisions highlighted on the time delay between the RF and 
the increase of RWL. 
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In [9] discretized data are normalized using Min-Max technique. In this study, the results showed 
eight days’ time lag relating to upstream RF and RWL with an ANN model of 24-15-3. Later, the model 
recommended five days’ time lag with 8-23-2 ANN model with a 0.007085% error. Type 2 SVM regression 
has been used by [2] to forecast the daily RWL of the Klang reservoir, Malaysia. The study employed Z- 
Score technique for data normalization and found out that the best input variables are combination of both RF 
and RWL, which were used to determine the best time lag which are two days of RF and with 1.64% error. 
Autoregressive Integrated Moving Average (ARIMA) model was developed in [4] for predicting the Kainji 
Dam, Nigeria daily water levels using a ten-year record. The study resulted in a model with a relative error of 
0.039% had the best prediction. In [10] ANN with feedforward back propagation was concluded as the 
suitable predictor for real-time water level forecasting of the Sukhi Reservoir, India. The inputs are the daily 
data of inflow, RWL, and RWR where the best time lag is ten days with a 0.82% error. NN was also 
employed in [11] to predict RWL and concluded a 5-25-1 NN model as the best architecture. The study 
found out that five days’ observations of RWL are significant for the RWR decision with a 0.038756% error. 
A NN architecture of 4-17-1 in forecasting the change of RWL stage was proposed in [3]. The input patterns 
were the changes and stages of RWL instead of the real value of RWL. The research showed that the changes 
in the stages of RWL were influenced by the two days of delay. However, modelling using NN techniques 
can be established without knowledge of the mathematical relationship between the inputs and the 
corresponding outputs. Whereas multiple regression is used to explore the relationship between one 
continuous dependent variable (DV) and a number of independent variables (IVs) or predictors (usually 
continuous). It can determine how well a set of variables is able to predict a particular outcome [12-18]. This 
study applied multiple regression in order to identify which IVs (slices of RWL and RF) can best be the input 
predictors to predict DV (RWL,). 

Another important issue to be considered which is related to forecasting is during the preprocessing 
phase where most non-parametric techniques normalize data into discretized data. Data normalization can 
influence the results of forecasting. Normalization can be performed at the level of the input features or at the 
level of the kernel [19]. In many applications, the available features are continuous values, where each 
feature is measured in a different scale and has a different range of possible values. In such cases, it is often 
beneficial to scale all features to a common range by standardizing the data. Previous studies mentioned 
above, have not reported any comparative study done on the normalization method used in their research. In 
[19-22], normalization process has increased the classification accuracy while in certain datasets, 
normalization may not demonstrate significant advantages [23] . 

In RWL forecasting, the data is in the form of temporal sequences, where time (month, day or 
hours) is critical [24]. The changes in the patterns of the data can influence certain decision-making. The 
Temporal Data Mining (TDM) technique is required to uncover the values of the attributes involved from 
temporal sequences representing temporal information related to certain decisions by the algorithm 
formulation. The significant time delay between the cause of event and the actual event needs to be captured 
accurately. Several studies reported on the use of temporal data in forecasting [3], [11], [25-33]. 

This paper presents reservoir water level (RWL) forecasting using normalization and multiple 
regression. In this study, continuous data of RF and changes of reservoir water level (WC) are normalized 
using two different normalization methods, Min-Max and Z-Score techniques. Its comparative studies and 
forecasting process are carried out using multiple regression. Three input scenarios for multiple regression 
were designed which comprise of temporal patterns of WC and RF. The sliding window technique has been 
used to capture the delay in temporal data. The experimental results showed that the best input scenario for 
forecasting the RWL, employs both the RF and the WC, in which the best predictors are three day’s delay of 
WC and two days’ delay of RF. The findings also suggested that the performance of the RWL forecasting 
model using multiple regression was dependent on the normalization methods. Root Mean Square (RMSE), 
Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) have been used as the 
parameters to measure the forecast results based on the actual data analysis. 


2. RESEARCH METHOD 

Figure 1 depicts the approach that has been used in conducting the research. The reservoir data 
which consist of RF and RWL from 1997 until 2006, have been collected from the Department of Irrigation 
and Drainage (DID), which is in charge of monitoring and managing the Timah Tasoh reservoir. This 
reservoir is one of the largest multipurpose reservoirs situated in the northern Peninsular of Malaysia.The 
data consists of operational and hydrological data. The operational data has the daily RWLs measured in 
metre (m) unit while the hydrological data has the daily RF readings measured in milimetre (mm), recorded 
from five gauging stations. 
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Figure 1. The process flow for RWL forecasting 


In the data preparation stage, the attributes are described and records with missing values were 
interpolated. This study used the RWL as the output whiles the changes of the reservoir water level (WC) and 
RF were used as the input. These WC will be calculated using equation [3](1): 


WC, = RWL, — RWL,., (1) 


where WC, is the change of RWL at current time t, RWL, is the RWL at current time f and RWL,. is the 
RWL at one previous day t-1. The RF data are averaged by the number of stations that have RF based on 
[30] (2): 


total_rain (2) 


Average RF = - - - 
number_of _stations_with_rain 








Next, the change-point detection technique is applied, where records which consist of gate opening 
decision only are extracted [34] while records with gate closing decision were removed. A total of 501 
records were detected from ten years of reservoir operation (1997—2006). 

The RF and WC data used in this study is temporal data with the time delayed event. The changes in 
RWL are the impact of several sequences events of RF. In order to capture the temporal information of WC 
and RF, sliding window technique is applied [34]. Figure 2 shows the pseudo-code for the sliding window 
where n is the size of the window. In this study, n is taken as the value of seven to investigate on the effect of 
seven previous event on current RWL [35] as showed in Table | and Table 2. 





for time ¢ to end of file 
read data at time t 
get data at (t-1)...(t-n) 
add into window slices set 
next 


Figure 2. Steps for Sliding Window 


Table 1. Sliced Reservoir WC 








Date RWLt WCt-1 WCt-2 WCt-3 WCt-4 WCt-5 WCt-6 WCt-7 
12-Feb-97 29.275 0.020 0.035 0.055 0.035 0.025 0.150 0.005 
13-Feb-97 29.335 0.060 0.020 0.035 0.055 0.035 0.025 0.150 


14-Feb-97 29.335 0.000 0.060 0.020 0.035 0.055 0.035 0.025 
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Table 2. Sliced RF 








Date Average_RF — RFt-1 RFt-2 RFt-3 RFt-4 RFt-5 RFt-6 RFt-7 
12-Feb-97 20.250 7.330 5.380 13.00 0.000 46.250 24.500 10.000 
13-Feb-97 13.875 20.250 7.330 5.380 13.000 0.000 46.250 24.500 


14-Feb-97 8.250 13.880 20.250 7.330 5.380 13.000 0.000 46.250 





In the next stage, the reservoir WC and RF are normalized, where the attribute data is scaled so as to 
fall within a small specified range. In a real application, because of the differences in the range of attributes’ 
values, one attribute might overpower the other. Normalization prevents the outweighing attributes with a 
large range. The goal is to equalize the size or magnitude and the variability of these attributes. There are 
many types of data normalization, however only two techniques are used to make a comparison in this study; 
Z-Score and Min-Max Normalization. 

In Z-Score normalization, the values for the attributes of reservoir WC and RF are normalized based 
on the mean and standard deviation. The equation for such transformation is given as follows (3): 


soe (3) 


where Zis the mean of attribute and SD is the standard deviation of the attribute. This method of 
normalization is useful if the actual minimum and maximum values of the attributes are unknown. The 
advantage of this statistical norm is that it reduces the effects of outliers in the data. Table 3 and Table 4 
showed the normalized WC and RF using Z-Score technique. 


Table 3. Z-Score of Reservoir WC 








Date zRWLt zWCt-1 zWCt-2 ZzWCt-3 zWCt-4 zWCt-5 zWCt-6 ZzWCt-7 
12-Feb-97 0.694 0.266 0.292 0.393 0.148 0.017 1.310 -0.204 
13-Feb-97 0.908 0.627 0.156 0.207 0.337 0.116 0.003 1.349 


14-Feb-97 0.908 0.086 0.519 0.067 0.148 0.314 0.108 0.010 





Table 4. Z-Score of RF 








Date zRFt zRFt-1 ZRFt-2 ZRFt-3 ZRFt-4 ZRFt-5 ZRFt-6 zRFt-7 
12-Feb-97 0.433 -0.463 -0.617 -0.192 -1.039 1.938 0.556 -0.351 
13-Feb-97 0.022 0.298 -0.503 -0.642 -0.191 -1.038 1.979 0.605 


14-Feb-97 -0.340 -0.077 0.254 -0.527 -0.688 -0.201 -1.045 2.049 





The second technique is Min-Max Normalization. This method rescales the attributes or outputs 
from one range of values to a new range of values. The attributes are rescaled to lie within a range of 0 to 1 
or from -1 to 1. The rescaling is accomplished by using the following equation (4): 


_ M-M (4) 


min 


new M _—M 


max min 


where M is the actual value of an attribute. This method has the advantage of preserving exactly all 
relationships in the data. Table 5 and Table 6 showed the normalized WC and RF using Min-Max technique. 


Table 5. Min-Max of Reservoir WC 








Date mRWLt mWCt-1 mWCt-2 mWCt-3 mWCt-4 mWCt-5 mWCt-6 mWCt-7 
12-Feb-97 0.5838 0.2735 0.2863 0.3034 0.2947 0.2863 0.3918 0.2694 
13-Feb-97 0.6185 0.3076 0.2735 0.2863 0.3116 0.2947 0.2863 0.3918 


14-Feb-97 0.6185 0.2564 0.3076 0.2735 0.2947 0.3116 0.2947 0.2863 
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Table 6. Min-Max of RF 








Date mRFt mRFt-1 mRFt-2 mRFt-3 mRFt-4 mRFt-5 mRFt-6 mRFt-7 
12-Feb-97 0.1387 0.0502 0.0368 0.0890 0.0000 0.3846 0.2037 0.0831 
13-Feb-97 0.0950 0.1387 0.0502 0.0368 0.1081 0.0000 0.3846 0.2037 


14-Feb-97 0.0565 0.0951 0.1387 0.0502 0.0447 0.1081 0.0000 0.3846 





Multiple regression is used to explore the relationship between one continuous dependent variable 
(DV) and a number of independent variables (IVs) or predictors (usually continuous). It can determine how 
well a set of variables is able to predict a particular outcome. The regression equation (5) takes the following 
form: 


Y =A+B,X,+B,X,+....+B,X, (5) 


where Y* is the predicted value on the DV, A is the intercept, the Xs represent the various IVs, and the Bs are 
the coefficients assigned to each of the IVs during regression. 

The ouput for this study is the RWL, and the inputs are reservoir WC and RF. This study designed 
three different input scenarios for multiple regression in order to identify which input scenarios (IVs) can best 
be the input predictors to forecast RWL, (DV). The first scenario considers the daily RF between time (t-/) 
and (t-7) as the sole input, while the second scenario considers both the RF (at t-/ — t-7) dan reservoir WC (at 
t-1 — t-7) as inputs. The third scenario uses the reservoir WC only between time (t-/) and (t-7) as inputs. 
Equations (6), (7) and (8) represent the first, second and third scenarios, respectively. 


RWL, =/RF(t-i) i= {-1, -2, -3, -4, -5, -6, -7} (6) 
RWL, = f (RF(t-i), WC(t-/)) i= {-1, -2, -3, -4, -5,-6,-7} j= {-1, -2,-3, -4, -5, -6, -7} (7) 
RWL, =fWC(t-i) i= {-1, -2, -3, -4, -5, -6, -7} (8) 


3. RESULTS AND ANALYSIS 

In this section, the results of the study are discussed based on inputs scenario and data normalization 
technique.The best input scenario is determined before proceeding further into the forecasting calculation. 
Based on statistical test in Table 7, the forecasted values obtained by employing second input scenario 
achieve the best results from other two scenarios. The scenario employs more input data, thus providing a 
better forecasting estimation. It has greater R” which is 0.319 as compared to the first and second scenario 
which has R’ values equal to 0.193 and 0.279 respectively. The second input scenario also has smaller 
standard error of estimate (SEE) for both normalization methods. The SEE for Min-Max Technique is 
0.13588, and SEE for Z-Score technique is 0.833856. Therefore, this second input scenario will be used as 
the best inputs for further data runs. 


Table 7. Statistical Test for Three Input Scenarios 








Input Scenario R R? SEE (Min-Max Technique) SEE (Z-Score Technique) 
First 0.440 0.193 0.14673 0.90548 
Second 0.565 0.319 0.13588 0.83856 
Third 0.528 0.279 0.13872 0.85607 





The sliding window technique has been successfully applied on RWL data to extract and segment 
the temporal data and preserved the delay. The study used multiple regression to find out that the best time 
lag for forecasting RWL, is three days’ delay of reservoir WC and two days of RF. Based on this finding, two 
set of regression model for RWL, are developed in order to investigate which normalization techniques 
produces less error. The first regression model used the Min-Max while the second model used Z-Score 
normalization technique as shown in equation (9) and (10): 


RWL, = (0.175) + (0.375)mWC,2 + (0.228)mWC,s + (0.358)mWC,4 + (0.172)mRF,1 + (0.183)mREF,2 (9) 


RWL, = (0.00) + (0.218)zWC,.2 + (0.129)zWC,.3 + (0.197)zWC,.4+ (0.123)zRF..1 + (0.132)zRF..2 (10) 
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Two sets of data based on two different data normalization were tested using the two regression 
model developed. Four statistical formula are selected to evaluate the forecasting efficiency in this study, 
namely Root Mean Square (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and the 
Correlation Coefficient (R). The comparison of statistical evaluation on two normalization techniques is shown 
in Table 8. The results showed that the obtained values of RMSE, MAPE and MAE by using Min-Max 
technique are 0.14125, 0.24191 and 0.11122 respectively. While using the Z-Score technique the results are 
0.87165, 6.90884 and 0.68677 respectively. All the RMSE, MAE and MAPE values obtained using Min-Max 
data normalization are closer to 0 than using Z-Score technique, indicating that the Min-Max techniques is 
better than Z-Score. However, the Z-Score technique provides slightly greater correlation coefficient values 
(R = 0.48858), than the Min-Max technique (R = 0.48856). In overall, forecasting using Min-Max data 
normalization techniques yield less error than using the Z-Score technique. The predicted output using Min- 
Max normalization is more reliable than that of the Z-Score normalization technique. 


Table 8. Comparison of Statistical Evaluation for Normalization Technique 








Normalization Technique RMSE MAPE MAE R 
Min-Max 0.14125 0.24191 0.11122 0.48856 
Z-Score 0.87165 6.90884 0.68677 0.48858 





4. CONCLUSION 

This paper has presented reservoir water level (RWL) forecasting using normalization and multiple 
regression. The research on the comparison of input scenario for multiple regression concludes that the best 
input scenario for multiple regression is the second input scenario which consists of combination data of RF 
and WC. 

The sliding window technique has been successfully applied on RWL data to extract and segment 
the temporal data and preserved the delay. The study used multiple regression to find out that the best time 
lag for forecasting RWL, is three days’ delay of reservoir WC and two days of RF. 

The comparative studies on the two different normalization methods of the Timah Tasoh reservoir 
data using multiple regression showed that data normalized using Min-Max technique can enhance the 
reliability of the forecasting model for RWL,. Forecasting using Min-Max techniques yield less error than 
using the Z-Score technique and the predicted output is more reliable. The experimental results showed that 
the prediction of the RWL, using MLR was dependent on the normalization methods used. 

In the future, other input variables such as sediment, volume of water release and spatial effect can 
be explored to improve the forecasting model of RWL,, The comparison of other various statistical 
normalization methods such as median, sigmoid and statistical column normalization can also be measured. 
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